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I give a brief introduction to the scope of lattice QCD calculations in 
our effort to extract the fundamental parameters of the standard model. 
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of CKM matrix elements from measurements of form factors for semi- 
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Second, I present the status of results for the kaon B parameter relevant 
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1. Introduction 



Current high onorgy oxporimonts show that the fundamental buikling bloeks of 
matter are quarks, gluons, leptons, photons, weak bosons and the elusive Higgs particle. 
The interactions between these particles are described by a set of theories, known col- 
lectively as the Standard Model. While this model has been immensely successful, and 
present data do not demand enhancements to the model or a new theory altogether, it 
is still incomplete. Experimentalists have yet to discover the top quark, the r neutrino 
and the Higgs boson. On the other hand it has proven very difficult to extract the pre- 
dictions of the Standard Model when the interactions among the elementary particles are 
strong. This happens in processes in which quarks interact through the exchange of gluons 
carrying 4-momenta less than a few GeV. Such processes cannot be calculated reliably 
using perturbation theory as there is no small expansion parameter. For this reason it has 
proven extremely difficult to make precise quantitative tests of the theory, such as making 
quantitative predictions that can be compared to experiments. Even twenty years after the 
formulation of QCD as the theory of strong interactions this state of affairs persists. What 
one needs are non-perturl)ative tools to include strong interaction effects. At present the 
most promising approach is to carry out large-scale numerical simulations using a lattice 
version of the gauge theory. In this talk I hope to describe the computational challenge 
presented by lattice QCD and the progress we have made. 

Let me begin by enumerating the 24 parameters of the standard model. 



Parameters 



Number 



Comments 



Masses of quarks 



u, d, s light 
c, b heavy 
t > 90 GeV ?? 



Masses of leptons 



Me, fx, T known 



?? 



Mass of VF± 

Mass of Z 

Mass of gluons, 7 



81 GeV 
92 GeV 




Mass of Higgs 



Not Found 



Coupling 
Coupling 



^ 1 for Energy < 1 GeV 
1/137 



1 



Coupling Gf = 1 10-^ Gev-'^ 

Weak Mixing Angles 3 ^12 , ^23 , ^13 

CP Violating phase 1 5 



Strong CP parameter 1 = 0?? 



Of these parameters the ones whose determination requires input fi'om lattice QCD 
are the masses of light quarks, rriu, ma, mg, the strong coupling a^, the weak mixing angles 
and the CP violating phase 5, and the strong CP parameter 0. Precise determination of 
their values will either validate the standard model or provide clues to new physics. 

The weak mixing angles and the CP violating phase 5 need some introduction. 
These parameters arise because quarks are not eigenstates of weak-interactions. The mix- 
ing between flavors is described by the 3x3 Cabibbo-Kobayashi-Maskawa [CKM) matrix 
V, 





1 Vud 




Vub 


V = 












Vts 





Here, for example, Vub is the strength of 6 — )■ u flavor transformation as a result of charged 
W exchange. For 3 generations = and the matrix can be written in terms of 4 

independent parameters, the 3 angles 6'i2,^23 and ^13 and the CP violating phase 5 as [1] 

(C12C13 S12C13 ■Sl3'^"*^\ 

-512C23 - Cl2S23Sl3e*'^ C12C23 - ^12 S23 ^13 e**^ S23C13 
512^23 - Ci2C23Si3e*^ -C12S23 " Si2C23Si3e*^ C23C13 / 

where Cij = c.osOij and Sij = sin 6ij for ^ = f,2,3. A non-zero value of S gives rise to 
CP violation in weak decays. 

The strong CP violating parameter arises because there is no symmetry or dy- 
namical argument to rule out a term like £e = ( <''0.9"^/327r^ )FF from the QCD Lagrangian. 
Even though this term is a total divergence its presence leads to observable consequences 
like CP violation because of instanton solutions in QCD. The best bound on this param- 
eter < 10"*^ comes from measurements of the electric dipole moment of the neutron, 
dN < 1.2 X lO-^^e cm [2]. 

The crucial matrix element needed in the theoretical analysis is of the pseudoscalar 
density u^^u + d^^d + within the neutron, and lattice calculations hope to provide 
a non-perturbative estimate. At present the numerical technology is not sufficiently well 
developed to undertake this calculation; what needs to be done is described in Ref. [3] and 
I refer to it for details. 
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To set the stage for the results presented later, let me give an outline of how 
lattice QCD interfaces with experimental data and theoretical predictions of the standard 
model to test the theory. The general form of SM prediction for a process is an expression 
(which I will call the master equation) consisting of three parts; known factors times some 
function of the unknown parameters times the matrix element of the appropriate operator 
sandwiched l)etween initial and final states. Thus for each process for which there exists 
accurate experimental data, knowing the vahie of the matrix element gives an equation 
of constraint for the remaining part involving the unknown parameters. Once a certain 
numl)er of such calculations are in hand we can extract accurate values for all the unknown 
parameters. Thereafter the standard model can be used to make accurate predictions for 
other processes. In this talk I will demonstrate this strategy with two examples, semi- 
leptonic form-factors and the kaon B parameter, that are discussed in Sections 6 and 7 
respectively. 

I will assume that the reader is familiar with Monte Carlo methods and Lattice 
QCD. Those who are not should, at this point, read the excellent pedagogical introduction 
given by D. Toussaint at this meeting or the monogram by Creutz [4]. 

2. Errors in lattice calculations 

Lattice calculations rely on a Monte Carlo sampling of configurations generated on 
a discrete space-time grid. Correlation functions are calculated as a statistical average, and 
are composed of gauge variables defined on links and quark propagators calculated on these 
background gauge configurations. This procedure introduces statistical and systematic 
errors into the results, so in order for you to judge progress in the field it is important for 
me to first explain these sources of errors. 

2.1. Statistical errors 

There exist robust, though slow, algorithms for generating independent gauge 
configurations. The typical sample size has been at best ~ 200 independent configurations. 
The cjuality of the signal depends very much on the ol)serval)le. however for the best case of 
spectrum calculations this sample size is adequate to reduce errors to less than 10 percent. 

2.2. Finite box size errors 

The energy E oi Si. state in a finite box with periodic boundary conditions is shifted 
due to interactions with mirror sources. Luscher has shown [5] that for large enough L the 
corrections are exponentially damped as exp —cEL where c « 1 is a constant that depends 
on the state, but the onset of the exponential regime has to be determined numerically. 
Present calculations indicate that for E^inL > 4 the asymptotic relation applies and that 
the errors are roughly a few percent. 
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2.3. Finite lattice spacing errors 

The continimm action is the first term in a Taylor series expansion of the lattice 
action. At the classical level corrections start at 0(a) for the Wilson formulation of the 
Dirac term and O(a^) for staggered fermions. They are O(a^) for the gauge part. In 
addition there are 0(a) corrections in the operators used to probe the physics. These 
corrections can be large on accessible lattices (typically a is in the range of 0.1 — 0.05 
fermi). There is considerable effort being made in the lattice community to reduce these 
errors by improving the lattice action and operators. It turns out that matrix element 
calculations are most severely affected by these 0(a) artifacts which are at present the 
largest source of uncertainty. In spectrum measurements these errors are much smaller 
once a < 0.1 fermi. 

2.4. Extrapolations from heavier quarks 

The quark propagator is the inverse of the Dirac operator. In the limit rUq 
iterative algorithms used to calculate the inverse face critical slowing down. Since physical 
u and d quark masses are very nearly zero, and because over 90% of the time in QCD 
simulations is spent in calculating the inverse one has had to resort to extrapolating to 
the physical point from heavier masses (typically from 0(ms) to (rriu + md)j'2, m^/25). 
The functional form used in the extrapolation is usually derived using just the lowest order 
chiral perturbation theory. This procedure introduces systematic errors. 

2.5. Effects of dynamical fermions 

Simulations with dynamical fermions are prohibitively slow. As a result one works 
with the quenched approximation. This is a priori a totally uncontrolled approximation 
and I discuss it in more detail in the next Section. 

2.6. Relation between lattice and continuum operators 

In order to compare lattice results with those in the continuum we have to deter- 
mine the relative normalization of the lattice and continuum operators. This is usually 
done using 1-loop perturbation theory, which leaves open the possibility that the 2-loop 
effects are large or there are large non-perturbative effects. A recent analysis by Lepage 
and Mackenzie suggests that 1-loop perturbation theory works very well provided one uses 
an appropriate definition of the coupling constant and one takes care of unwanted ultravi- 
olet fluctuations using mean-field improvement [6]. So far the results from this approach 
agree very well with non-perturbative estimates in cases where the latter calculations are 
feasible. Further checks are under way. 
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3. Quenched versus unquenched calculations 

In lattice QCD one calculates physical quantities as a statistical average over a set 
of background gauge configurations. For any given observable O, 



where C/^ ^ is an SU(3) matrix defining the gauge field on a link in direction // at site 
i. The background gauge configuration, {?7i,p}, is generated with Boltzmann weight 
detM[C/] e~^ii. The factor detM[C/] is the determinant of the Dirac operator and arises as 
a result of integrating over the quark degrees of freedom. Physically this factor takes into 
account the possibility that the QCD vacuum can create and annihilate quark/anti-quark 
pairs spontaneously. The determinant is a completely non-local object even though the 
initial Dirac action is only nearest-neighbor, and computationally very hard to include in 
the Monte Carlo procedure. It is therefore expedient to make an approximation - called 
the ciuenched approximation - in which one sets detM[?7] = 1. This corresponds to alter- 
ing the QCD vacuimi l)y artificially timiing off vacuum polarization effects. The question 
to address then is how serioiis is this approximation. 

The cjucnched vacuum possesses all three uniqiie properties of QCD, i.e. confine- 
ment, asymptotic freedom and spontaneous chiral symmetry breaking. For this and other 
reasons it is expected that setting detAf[?7] = 1 is a good approximation (on the level 
of 10%) for a large number of observables. Present simulations bear out this belief for 
sea quark masses roughly > nig. While this is encouraging, it is by itself not sufficient 
to validate the approximation as sea quark effects in the same quantities are expected to 
be significant only for niq < m^. For this reason one has to proceed case by case, and 
eventually check using the full theory. 

These checks are made difficult by the presence of statistical and systematic errors 
( like finite lattice size and spacing, and extrapolation from heavier quarks) discussed 
above. Therefore, to expose the effects of vacuum polarization one needs to first l)ring these 
other errors down to the level of a few percent. Since the methodology for measuring many 
quantities is identical with or without the use of the cjuenched approximation to produce the 
statistical sample of background configurations, the strategy has been to first understand 
and control these errors in the simpler case. Thus the quenched approximation should be 
regarded as a test of our numerical techniques as well as a very good approximation to 
systematically improve upon. 

The quenched approximation does have its limitations. Recent analysis, using 
chiral perturbation theory, of proton and pion masses show that in the quenched approx- 
imation these quantities develop non-analytic terms in addition to the desired physical 
behavior [7] [8]. So far it has been hard to exhibit the presence of these unwanted terms 




-5, 
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in numerical data; the hope is that the coefficients of these terms become significant only 
at much smaller quark masses and extrapolations from heavier masses are still sensible. 
Clearly this aspect of the quenched approximation needs more attention. 

Let me end this discussion with a rough comparison of simulation time with and 
without dynamical fermions. With present algorithms the CPU requirements increase as 
for the quenched approximation and as L^*^'^ with light dynamical fermions. Folding 
in the prefactors we find that for two degenerate flavors of quarks with roughly the mass 
of the strange cjuark. hill QCD simulations are a factor of 1000 — 2000 times slower. For 
smaller quark masses this factor will increase according to the al)Ove scaling l)ehavior. As 
a result it is clear that we need improvements in update algorithms before contemplating 
realistic simulations with the full theory for the purpose of evaluating matrix elements 
within states made up of light hadrons. 



4. Lattice QCD is not an open-ended problem 

The masses of hadrons are very well measured experimentally. For this reason we 
know the different energy scales in the problem. To analyze the physics of light quarks 
(-u, d, s) there are three scales that we have to consider. First L > ^maximum, ^-nd we take 
^maximum = I/^^tt piou is the lightest particle. Current simulations tell us that for 

L I ^maximum ~ 5 the finite size effects are down to a few percent level. Second, the lattice 
should be fine enough such that no essential features of the hadron's structure are missed 
as a result of discretizing the theory. This scale is controlled by Cminimum/ o.. We choose 
Cmi.nimum to bc tlic reciprocal of the proton mass. Again current numerical data tell us that 
for ^minimuml d ~ 5 finite lattice spacing errors are reduced to the level of a few percent. 
Lastly, imaxtm.um.lim.tntrnura = Mpmton/M^ = 7 is an accurately mcasurcd numbcr (getting 
this ratio correct in lattice simidations is equivalent to timing ?77,„ to its physical value). 
Putting these three factors together tells us that definite measurements require lattices of 
size L ~ 175. Thus, unless present analysis has lead us to grossly underestimate the first 
two scales, definite calculations can be done in the quenched approximation on computers 
that can sustain 1-10 teraflops. 



5. Hadron Spectrum 

The first step towards the analysis of matrix elements is to calculate quark prop- 
agators. These quark propagators are combined to form hadron correlators. Matrix ele- 
ments are calculated by sandwiching the appropriate operator between the initial and final 
state hadrons. The quality of the results depends on how well one has isolated the desired 
hadronic states before inserting the operator, for example eliminated the radial excitations 
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that contaminate the signal. To extract the matrix element from the correlation function 
one has to remove the external legs by dividing the 3-point function by 2-point functions. 
Thus, a necessary condition for getting accurate results is to enhance the signal in the 

2-point correlators quantities from which we extract decay constants and the energy 

of the state. It is therefore appropriate that as a prelude to presenting results for matrix 
elements I give a brief review of spectrum calculations. 

Calculations of the light hadron spectrum use three input parameters: two quark 
masses, rrtu and mg (we assume rUu = md). and the bare gauge coupling constant. The 
quark masses are adjusted to give the physical masses for the vr and K mesons, hi practice 
one adjusts the ratio of their mass to that of the proton and, as mentioned above, at present 
we have to make an extrapolation from heavier quark masses. If QCD is the correct 
theory of strong interactions then all other mass ratios should agree with experimental 
numbers as the bare gauge coupling is tuned to zero. Again we extrapolate Qbare 
using renormalization group scaling. The status of these calculations is summarized by 
Ukawa at LATTICE92 meeting [9], and the most complete calculation to date is by Butler 
et al. [10]. 

The results show that finite size errors are down to a few percent level when 
L/Cmaximum > 5 and finite lattice spacing errors are of similar size for ^minimum > 5. 
More importantly, the quenched results agree with experimental data to within 10%. This 
is a remarkable agreement considering the shift in rho mass due to p — > tttt decay has 
not l)eeii taken into account in setting the scale. For this reason I would like to see 
independent confirmation of the results of Butler ci al. l)efore declaring this aspect of 
spectrum calculations under control. In any case these results, in part, form the basis 
of my earlier conclusions on relevant scales. The finite a errors are expected to be much 
larger in matrix element calculations as discussed later. 



6. Semi-leptonic form factors of heavy-light mesons from lattice QCD 

The semi-leptonic decays of mesons containing one heavy valence quark (c, b) and 
one light valence quark (u, d, s) may provide the most accurate determination of the flavor 
mixing angles. Consider the case. D^Xlu. where X has flavor content us [K or K*). In 
the one W exchange approximation the amplitude is 



{X-l+v\Hw\D') = ^Jd^x {X-l+u\{V - A)liV - A)^\D') 



(6.1) 



where Gp is the Fermi constant, Vcs is the c — )■ s CKM matrix element. This process 
is particularly simple because the hadronic and leptonic currents factorize. The leptonic 
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part of the decay can be calculated accurately using perturbation theory, while to take 
into account non-perturbative contributions to the hadronic part 



H^ = {X\s^^{l-^^)c\D) 



(6.2) 



one resorts to lattice QCD. In this talk I will present our results for the case K-e+ 
as it is the simplest. 

6.1. 1)0 ^ K-e^v 

The matrix element iJ^ can be parameterized in terms of two form factors: 

{K-{vk)\-siA^ - iMD\vd)) = pMQ^) + Qf^f-iQ^), (6-3) 

where p = {pd + Pk) and q = {pn — Pk) tli<^ momentum carried away by the leptons, 
and = —q^ (which is always positive). I use the Euclidean notation p = {p, iE) so that 
E"^ . An alternative parameterization is 



2 



{K-{pK)\s'y^{l-'y,)c\D^{pn)) 



Pf^ 



where 



— m\- 

l2\ r t^2 



(6.4) 



MQ') = f+{Q') + 



nil 



(6.5) 

In the center of mass coordinate system for the lepton pair, i.e. g = or equivalently 
Pk = Pd , one has 

{K-{pk)\s^c\D'{Pd)) =2pnf+{Q^), 



{K-{pK)\sj4c\D'{pn))=^ 



m 



K 



(6.6) 



Thus, the form factor f^{Q'^) is associated with the exchange of a vector particle, while 
fo{Q^) is associated with a scalar exchange. It is common to assume nearest pole domi- 
nance and make the hypothesis 



/+(o) 



2 ' 



/o(0) 



2 ' 



(6.7) 



where nijp is the mass of the lightest resonance with the right quantum numbers to mediate 
the transition; Z)+(1969) or Z)*"'"(2110) in the pseudoscalar or vector channels respectively. 
The goal of the lattice calculations is to determine the normalizations /+(0) and /o(0) and 
map out the dependence. 
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In the limit of vanishing lepton masses, the vector channel dominates and one can 
write the the differential decay rate as 

\{Cf) ={ml + m\ - Q^ f - 4rnlmj^. 

To integrate this, the functional form of must be known. Assuming vector meson 
dominance numerical integration gives 

T{D° K-e+p) = 1.53|V,,|^|/+(0)|^ x IQ-^sec-^ (6.9) 

Eqn. (6.9) is the simplest example of the master equation: using it we can extract V^s 
once T{D^ — > K~e'^u) has been measured and calculated using lattice QCD. hi this 
case, however, iVc^l = 0.975 is known very accurately, so one extracts |/+(0)| fti 0.75. The 
quantity /o(0) has not been determined. 

The details of our lattice calculation of the form- factors are given in Ref. [11], so 
here I briefly describe some of the lattice technicalities and present the results. I would like 
to emphasize that the results presented here are exploratory. The goal was to investigate 
different numerical techniques in order to improve the signal to noise ratio. The data 
confirm that the numerical techniques are now good enough to get reliable results with 
today's massively parallel computers. 

6.2. Lattice parameters 

Our statistical sample consists of 35 lattices of size 16^ X 40 at /3 = 6.0 corre- 
sponding to a lattice spacing a = 0.1 fermi. We fix the heavy (charm) quark mass at 
K = 0.135. and use only two values of the light quark mass, k = 0.154 and 0.155. Us- 
ing = 1.9 GeV, this corresponds to a heavy-light meson of mass 1.59 and 1.54 GeV 
(about the mass of the physical charm quark) and to light-light pseudoscalar masses of 
roughly 690 MeV and 560 MeV. Our heavy-light pseudoscalar mesons therefore correspond 
most closely to the physical D meson, with a somewhat massive light constituent, while 
the light-light mesons are analogous to the physical K. We will henceforth adopt this 
nomenclature. 

6.3. Quark propagators and 3-point Correlation function 

The calculation of quark propagators is done on lattices doubled in the time direc- 
tion, i.e. 16^ X 40 ^ 16^ x 80. We use periodic boundary conditions in all four directions. 
These propagators on doubled lattices are identical to forward and backward moving so- 
lutions on the orig inal 16^ x 40 lattice. To improve the signal we use the "Wuppertal" 
smeared source method for generating the propagators. 
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In the 3-point correlation function the source for the K meson is fixed at = 1 
and for the D meson at tjj = 32. As a result the wrap-around effects in time direction 
are exponentially damped by at least 18 time slices because of doubling the lattices. The 
position of the insertion of the vector current is varied over 4 < t < 28 to improve the 
statistics. The lowest order Feynman diagram for this process is shown in Fig. la. Fig. lb 
shows one possible correction term due to gluon interactions which make perturbative 
analysis of the matrix element hard. 

6.4. Operators and correlators 

In order to get a handle on 0{a) effects coming from the lattice operator we use 
three transcriptions for the vector current 



V, 



local 



{x) = q^{x)-fi^q2{x), 



(6.10) 



Vl;''^'{x) = - {q^{x)'yf,Uf,{x)q2{x + a//) + q^(x + aii)'yf,Uf,{x)'' q2{x)] , 

y;"°"(.r) = i (gi(.T)(7^ - l)U^(x)q2{x + aij) + q^{x + a/i)(7^ + IjU^ix^ q2{x)) . 

In our calculation the quarks qi and q2 may both be light, or one heavy and one light. 
Note that V^"^^'(x) is conserved only for degenerate quarks. We use the Lepage-Mackenzie 
improved normalization of these currents relative to the continuum vector current. The 
lattice field for a quark of flavor i is related to its contininim counterpart by 



cont 



V8f 



3k 



4k 



(6.11) 



where k^, = 0.15702 is the value of the hopping parameter that corresponds to zero pion 
mass. To get the normalization of the local vector current we multiply the 1-loop pertur- 
bative result for the operator by that for 8kc. This gives a better perturbative expansion 
as the large tadpole contributions (lattice artifacts) are cancelled. The result is 



qi{x)jf,q2{x) 



cont 



cont 



1 - 



3ki 

AKr 



1 - 



3k2 

AKr 



(1 - Q.d>2av)qi{x)-i^^q2{x) 



(6.12) 



where ay = g'j^/ATr is the renormalized coupling, which we take to be = 1-75'^^^^. 

In the extended 1-link and conserved currents the tadpoles cancel, and to 0{ag) 
the relation between continuum and lattice operators is (the details are given in Ref. [11]) 



V„ 



= 8k, 



cont 



3ki 

AKr 



3k2 

AKr 



(1 - 1.038av)V, 



ext. 



L 



and similarly for the conserved current 



V„ 



= 



cont 



3ki 

AKr 



3^2 ycons. 

AKr 



(6.13) 



(6.14) 



In the next sub-section I present our data and demonstrate that to get consistent results 
between the three lattice currents it is important to use these normalizations. 
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A) 




t 



K 



tD 







. c 











B) 




Fig. 1. (A) The semi-leptonic decay of a meson to a K~l'^v final state. The c — )■ s 
transition takes place through the emission of a and only the vector part of the V — A 
weak current contributes. The interaction is not pointlike at the hadronic vertex and its 
dependence is given by the form-factors. (B) An example of QCD corrections to the 
matrix element H,, 
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6.5. Results 

I am going to skip over all the details of the analysis and the discussion of the 
quality of the signal in the correlators due to lack of time. These are given in Ref. [11]. The 
final results for the form-factors are given in Table 1. Our analysis show that within their 
respective l-cr uncertainty the three different lattice transcriptions of the vector current 
give consistent results and the difference between the local, extended and "conserved" 
currents can be taken to be a measure of the remaining 0(a) corrections. The numbers do 
not show a large variation for the two vahics of the light quark mass that we have used and 
the value of /+((5^) is roughly consistent with the phenomenological value /+(0) = 0.75. 



K = 0.154 


Current 


/+(g2= 0.217) 


/_(g2 = 0.217) 


/o(g2= 0.217) 


/o(g2 = -0.05) 


Local 


0.61(11) 


-0.44(25) 


0.66(13) 


0.91(9) 


-{/■Ext. 


0.68(12) 


-0.41(24) 


0.72(14) 


1.01(11) 


'(/'Cons . 
V 


0.80(12) 


-0.30(23) 


0.83(13) 


1.18(12) 


K = 0.155 


Current 


/+(Q2 = 0.260) 


/_(Q2 = 0.260) 


/o(Q2 = 0.260) 


/o(Q2 ^ -0.035) 


'(/'Local 


0.65(20) 


-0.65(36) 


0.69(21) 


0.96(10) 


'(/'Ext. 


0.66(24) 


-0.52(36) 


0.70(24) 


1.04(11) 


'(/Cons . 
V 


0.80(27) 


-0.37(38) 


0.82(27) 


1.23(13) 



Table 1: The data for semi-leptonic form-factors for each of the three definitions of the 
lattice vector current. The two values of light quark mass correspond to pions of roughly 
690 and 560 MeV. 

We can also compare our results with earlier calculations as these were done with 
similar lattice parameters. The group of Bernard et al. [12] measured the form-factors on 
24"^ X 40 lattices at the same values of /3 and k. They used only the local vector current, 
and adopted a different normalization. Converting their result to the normalization we use 
gives fo{p = 0) = 0.85(10) at k = 0.154 to be compared with our value of 0.91(9). Similarly 
the Rome-Southampton group [13] [14] have measured the form-factors on 20 x 10^ x 40 
lattices at the same value of (3 and similar k. They use the "conserved" 1 vector current. 
Again, using the same normalization for the vector current that we use and interpolating 
their results to k = 0.154, we find f^{'p = 2tt/L) = 0.72(7) to be compared with our result 
of 0.80(12) and /o(p = 27t/L) = 0.70(5) to be compared with 0.83(13). 
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The internal consistency of our results and the agreement with previous calcula- 
tions shows that semi-leptonic form-factors can be extracted from lattice simulations. The 
largest source of error in present results comes from 0(a) corrections and an inadequate 
signal in the non-zero momentum correlators. The next round of calculations are being 
done on 32^ X 64 lattices on the CM5. These will hopefully address the phenomenologically 
interesting cases of the decay of D to vector mesons and of B — )■ tt and B ^ D which are 
crucial for extracting Vbu and Vbc from the experimental data. 

7. The kaon B parameter 

CP violation in the standard model is governed by a single parameter 8 provided 
we assume that = 0. Once the value of 5 is known then each CP violating process will 
provide a constraint involving the mixing angles and quark masses. I illustrate this using 
as an example the mixing between and K'^ as it is the best measured CP violating 
process. 

The mass eigenstates in the neutral kaon system are defined as 

\Kl) = 4 [{l + e)\K'> +{l-e)\K'>] 
1^5) = ^ [il + e)\K'> -{l-e)\K'>] 

where TV is the normalization. The parameter e measures the amount of CP violation, and 
in the standard model is given by the master equation [2] 

e = 1.4e-/^ sin SBk hnfsi'm,, -m]^ + 7j2^/2(m,)Re( ^*^^^*%^-'^^- ) | (7.2) 

where r/i = 0.7, r]2 = 0.6 and r/3 = 0.4 are the QCD correction factors and /2 and /a are 
known functions of the quark masses. The value of e is known experimentally to be 

|e| = (2.258 ±0.018) X 10"^ (7.3) 

In Eq. (7.2) the strong interaction corrections arc encapsulated in the parameter 
which is the ratio of the matrix clement of the A5' = 2 four-fermion operator 
(57^(1 — 75)c?)(^7yL((l — l5)d) to its value in the vacuum saturation approximation 

<:^|(I7m(1-75)^^)|0)(0|(I7m(1- 75)^^)1^) = ^flMlBK. (7.4) 

Theoretical estimates of this parameter vary from 0.33 to 1 and lattice calculations aim to 
provide a non-perturbative answer. 
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The steps in the calculation leading to Eqn. (7.2) are show in Fig. 2. In the 
standard model K^K^ mixing can occur due to the second order weak process shown in 
Fig. 2a. Since the and the top quark are heavy, it is expedient to integrate them out 
and define an effective 4-fermion interaction at some scale /i > nic. This is represented by 
the diagram in Fig. 2b. This weak amplitude is modified by strong interaction corrections 
as illustrated in Fig. 2c, and it is these corrections that change the value of from 1.0. 

The calculation of Bk bf^s been done with both staggered and Wilson fermions. 
At present simulations using staggered fermions are far more extensive and have much less 
theoretical uncertainty. The two formulations give consistent results [15]. so I will present 
results only for staggered fermions as these have much smaller errors. The details of these 
calculations are given in Refs. [16] [17] [18]. Our final results from different lattices and 
for different values of a are shown in Fig. 3. This calculation is sufficiently mature that 
one can analyze the data with respect to the 6 sources of errors discussed in Section 2. 

1. Statistical errors: Three independent samples of configurations have been analyzed 
at /3 = 6.0 and results for Bx are consistent within errors. Also, the Japanese 
group [18] have carried out a totally independent calculation and get the same 
results. I take this to indicate that the analysis of statistical errors is correct. 

2. Finite Size errors: We have compared results on 16^ X 40 lattices with those on 
24^ X 40 at /3 = 6.0 and on 18^ x 42 lattices with those on 32^ x 48 at /3 = 6.2. 
In both cases the results are consistent. Our conclusion is that finite size effects 
in the data presented in Fig. 3 are much smaller than the statistical errors and at 
most 1-2%. 

3. Finite lattice spacing errors: These errors come from both the lattice action and 
the operators used in the measurements. Fig. 3 shows two different extrapolations 
assuming corrections to be either 0(a) or 0{a!^). These two different ways of 
extrapolation yield Bk<]~^^^ = 0.44(4) versus 0.54(2) in the continiumi limit. The 
uncertainty in the form of extrapolation to use is at present the largest source of 
error in the data. Preliminary analysis suggests that the corrections in staggered 
fermion data are 0{a^). This will be checked by improving the statistics at /3 = 6.4 
and doing another simulation at, say, /3 = 6.6. 

4. Extrapolation in niq : The K'^ consists of d and 1 valence quarks. In our calculations 
the values of Bk are read off from a simulation in which the two quarks are almost 
degenerate, say both with mass msj2. We have done some tests by varying the two 
quark masses in the range nig/ 3 — 3m^ to check for effects of using non-degenerate 
masses. So far our conclusion is that these are at best a few percent. Going to 
smaller masses becomes increasingly harder as it requires higher statistics and a 
larger lattice, but otherwise the calculation is the same. 

5. Quenched approximation: Two independent calculations have been done using 



14 




Fig. 2. (A) One of the two possible box diagrams responsible for the mixing between 
K^K'^. (B) The short distance interactions involving the W exchange and t quark inter- 
mediate state is replaced by the AS = 2 4-fermion effective interaction. (C) One possible 
QCD correction to the weak decay. Lattice QCD is a non-perturbative method to sum all 
such possible corrections. 
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lattices generated with 2 flavors of dynamical fermions [19] [20]. The quark mass 
in the update is ~ m^. The results, though preliminary, are consistent within 
errors with the quenched data. Based on this comparison our present estimate 
is that quenching may introduce only a 5 — 10% correction, making Bk one of 
the first quantities for which we expect lattice QCD to yield accurate results. To 
improve upon this first check we need to study the effect of tuning to its 
physical value both in the update of lattices and in the valence quark propagator. 
6. Operator renormalization: The 1-loop calculation relating the lattice operator to 
the contimmm has been done [21] [18], and the upshot of it is that including this 
factor reduces by about 6 — 7%. 

Finally, to make contact with phenomenology we have to remove the dependence 
on the renormalization point ji at which the effective theory is defined in the continuum. 
The /i independent parameter is Bx = Bkchs '^^^ , and for (3 = 6.0 the correction factor is 
as = 1.34 with roughly a 10% uncertainty coming from the uncertainty in the lattice 
scale [22]. 

With all these estimates in hand our current estimate is Bk = 0.68(10). To get 
this I have used the O(a^) extrapolation for Bk data and have only included the operator 
renormalization factor as the other sources of systematic errors are smaller and less well 
determined. 

To conclude. I hope I have convinced you that lattice QCD calculations can play a 
very important role in our understanding of the standard model. The cjuality of results will 
be systematically improved with better numerical techniques and with l)igger and faster 
computers. Therefore it is appropriate that I end this talk with a brief report on the status 
and performance of our QCD codes on the CM5. 

8. Optimization of QCD codes on the CMS 

We have finished the first phase of the development of QCD codes on the CM5. 
The overall strategy is to keep all the control structure in CMFortran under the SIMD 
programming environment. We isolate the computationally intensive portions of the code 
and convert them to CDPEAC. This way we are able to preserve modularity in order to 
implement changes in the algorithm and to add new measurement routines very quickly. 

The two key operations that capture the essence of QCD calculations are 

A=B+C*D 

(8-1) 

A = B + C* cshift{D) 

where A, B, C, D are 3x3 complex matrices and the circular shift (cshift) is by ±1 
lattice units in one of the four directions. (Same amount of communication is done in 
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all four directions). The lattice size being used is 32^ X 64 and we use single precision 
variables. Thus a typical array layout is A{: serial, : serial, : news, : news, : news, : news) 
with dimensions ^1(3, 3, 32, 32, 32, 64). At present the second operation is broken up into 
two parts 

tmp = cshi ftiD) 

(8.2) 

A = B -\- C * tmp 

as there is no way to overlap communications with computations at the CMF level. The 
key lessons learned from optimizing the above two kinds of primitives are*: 

1. There is no discernible performance penalty for calls to CDPEAC routines. So the 
code can be made modular and portable by converting small compute intensive 
parts into CDPEAC subroutines. 

2. We vectorize over the sites. All loads and stores are joined with arithmetic opera- 
tions, so we reload variables as necessary. This allows us to optimize register use 
to get a long vector length. 

3. Each time we load a different array, say B after C, we pay a penalty of 5 cycles 
due to DRAM page faults. Since data elements in a vector load are contiguous in 
memory, there is no penalty within the vector operation. The DRAM page faults 
reduce the maximum possible speed from 64 to 50 MlPS/node. Other forms of 
data layout do not provide any significant improvement in performance and we do 
not recommend hand tuned layouts as they make the code much more complicated 
without any gain in speed. 

4. For on node calculations we sustain approximately 50 Megaflops/node for multi- 
plies or adds and 100 when we can chain multiply with add. Thus we are able to 
get optimal performance with very simple vectorization and data layout strategy. 

5. By writing matrix multiply in CDPEAC we avoid single-precision loads and stores 
(this constitutes the l)ulk of the factor of 3 — 5 performance gain over CM Fortran) 
as complex numbers are double word aligned. Single stores should be avoided 
whenever possible. 

6. The cshift operation is slow due to off-node communication speed and because it 
does unnecessary memory to memory transfer of on-chip data. In SIMD mode 
the unnecessary moves can be avoided only by combining cshift with the matrix 
multiply. Also, part of the on-VU arithmetic can be done while the off-node data 
is in the network. This optimization step requires writing what is essentially a 



* All tests and comparison timings were done using CM Fortran Driver Version: 2.1 Beta 
1 Rev: f2100 w/ release 2.1 beta 0.1. A number of inefficiencies have been fixed in Version: 2.1 
Beta 1 Rev: f2100 but we have not yet timed our codes under it. 
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stencil in DPEAC, and we are currently implementing this with help from staff at 
Thinking Machines. 

In conclusion, it is clear that to develop an optimizing CMF compiler is hard and 
performance aficionados will have to program at CDPEAC level for possibly the complete 
lifetime of the present architecture. Therefore, I have not discussed any of the inefficiencies 
of CMF that are removed by writing in CDPEAC. For those who are willing to write in 
CDPEAC there is additional reward as the CM5 is a stable high performance massively 
parallel computer. 
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